Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes

نویسندگان

  • Wen-Pin Lin
  • Matthew Snover
  • Heng Ji
چکیده

The automatic generation of entity profiles from unstructured text, such as Knowledge Base Population, if applied in a multi-lingual setting, generates the need to align such profiles from multiple languages in an unsupervised manner. This paper describes an unsupervised and language-independent approach to mine name translation pairs from entity profiles, using Wikipedia Infoboxes as a stand-in for high quality entity profile extraction. Pairs are initially found using expressions that are written in language-independent forms (such as dates and numbers), and new translations are then mined from these pairs. The algorithm then iteratively bootstraps from these translations to learn more pairs and more translations. The algorithm maintains a high precision, over 95%, for the majority of its iterations, with a slightly lower precision of 85.9% and an f-score of 76%. A side effect of the name mining algorithm is the unsupervised creation of a translation lexicon between the two languages, with an accuracy of 64%. We also duplicate three state-of-the-art name translation mining methods and use two existing name translation gazetteers to compare with our approach. Comparisons show our approach can effectively augment the results from each of these alternative methods and resources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WikiNet: A Very Large Scale Multi-Lingual Concept Network

This paper describes a multi-lingual concept network obtained automatically by mining for concepts and relations and exploiting a variety of sources of knowledge from Wikipedia. Concepts and their lexicalizations are extracted from Wikipedia pages. Relations are extracted from the category and page network, infoboxes and the body of the articles. The network consists of a central, language inde...

متن کامل

Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia

Wikipedia infoboxes are a valuable source of structured knowledge for global knowledge sharing. However, infobox information is very incomplete and imbalanced among the Wikipedias in different languages. It is a promising but challenging problem to utilize the rich structured knowledge from a source language Wikipedia to help complete the missing infoboxes for a target language. In this paper, ...

متن کامل

WRPA: A System for Relational Paraphrase Acquisition from Wikipedia

In this paper we present WRPA, a system for Relational Paraphrase Acquisition from Wikipedia. WRPA extracts paraphrasing patterns that hold a particular relation between two entities taking advantage of Wikipedia structure. What is new in this system is that Wikipedia’s exploitation goes beyond infoboxes, reaching itemized information embedded in Wikipedia pages. WRPA is language independent, a...

متن کامل

WRPA: A System for Relational Paraphrase Acquisition from Wikipedia∗ WRPA: Un sistema para la adquisición de paráfrasis de relaciones de la Wikipedia

In this paper we present WRPA, a system for Relational Paraphrase Acquisition from Wikipedia. WRPA extracts paraphrasing patterns that hold a particular relation between two entities taking advantage of Wikipedia structure. What is new in this system is that Wikipedia’s exploitation goes beyond infoboxes, reaching itemized information embedded in Wikipedia pages. WRPA is language independent, a...

متن کامل

Updating Wikipedia via DBpedia Mappings and SPARQL

DBpedia crystallized most of the concepts of the Semantic Web using simple mappings to convert Wikipedia articles (i.e., infoboxes and tables) to RDF data. This “semantic view” of wiki content has rapidly become the focal point of the Linked Open Data cloud, but its impact on the original Wikipedia source is limited. In particular, little attention has been paid to the benefits that the semanti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011